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We propose a new protocol of universal entanglement concentration, which converts many copies 
of an unknown pure state to an exact maximally entangled state. The yield of the protocol, which 
is outputted as a classical information, is probabilistic, and achieves the entropy rate with high 
probability, just as non-universal entanglement concentration protocols do. 
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I. INTRODUCTION 

Conversion of a given partially entangled state to a maximally entangled state by local operation and classical 
communication (LOCC) is an important task in quantum information processing, both in application and theory. If 
the given state is a pure state, such protocols are called entanglement concentration, while mixed state versions are 
called entanglement distillation. 

In the paper, we study universal entanglement concentration, or entanglement concentration protocols which take 
unknown states as input, and discuss the optimal yield in higher order asymptotic theory, or in non-asymptotic theory, 
depending on the settings. 

The reason why we studied universal entanglement concentration rather than universal entanglement distillation 
is to study optimal yield in detail, in comparison with non-universal protocols, which had been studied in detail 
[6, 12, 15, 18, 19]. Note that study of the optimal entanglement distillation is under development even in non-universal 
settings. For example, the known formula for the optimal first order rate still includes optimization over LOCC in 
one-copy space. This is sharp contrast with the study of entanglement concentration, for which non-asymptotic and 
higher order asymptotic formula in various form are obtained. 

As demonstrated by Bennett et al. [3], if many copies of known pure states are given, the optimal asymptotic yield 
equals the entropy of entanglement of the input state [18]. To achieve the optimal, both parties apply projections 
onto the typical subspaces of the reduced density matrix of given partially entangled pairs (BBPS protocol, hereafter). 
Obviously, the protocol is not applicable to the case where information about Schmidt basis is unknown. Of course, 
one can estimate necessary information by measuring some of the given copies. In such protocol, however, the final 
state is not quite a maximally entangled state, because errors in estimation of the Schmidt basis will cause distortions. 

This paper proposes a protocol, denoted by {C™}, of universal distortion-free entanglement concentration, in which 
exact (not approximate) entangled states are produced out of identically prepared copies of an unknown pure state. 
Its yield is probabilistic (, so users cannot predict the yield beforehand), but the protocol outputs the amount of the 
yield as classical information, ( so that users know what they have obtained, ) and the rate of the yield asymptotically 
achieves the entropy of entanglement with probability close to unity. 

A key to construction of our protocol is symmetry; an ensemble of identically prepared copies of a state is left 
unchanged by simultaneous reordering of copies at each site. This symmetry gives rise to entanglement which is 
accessible without any information about the Schmidt basis. 

In some applications, small distortion in outputs might be enough, and estimation-based protocols might suffice, 
because entropy rate is achieved anyway. In higher order asymptotic terms and non-asymptotic evaluations, however, 
we will prove that our protocol is better than any other protocols which may allow small distortion. 

In the proof, the following observation simplifies the problem to large extent. Let us concentrate on the optimization 
of the worst-case quantity of performance measures over all the unknown Schmidt basis, because the uncertainty about 
Schmidt basis is the main difficulty of universal entanglement concentration. We also assume performance measures 
are not increasing by postprocessing which decreases the Schmidt rank of the product maximally entangled state. 

With such reasonable restrictions, an optimal protocol is always found out in a class of protocols which are the 
same as {C™} in output quantum states, but may differ in classical output. Therefore, any trial of improvement of 
{C™} cannot change real yield. What can be 'improved' is the information about how much yield was produced. 
For {C™} outputs the information about yield correctly, there should be no room for 'improvement' in this part, 
too. This observation assures us that {C™} is optimal if the criterion is fair. In addition, the optimization is now 
straightforward, for we have to optimize only classical part of the protocol. 
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Based on this observation, we prove the optimality in terms of a natural class of measures: monotone increasing 
measures which are bounded over the range and continuously differentiable except at finitely many points. In terms 
of such measures, distortion-free condition trivially implies the non-asymptotic optimality of our protocol, while the 
constraint on the distortion implies that our protocol is optimal up to the higher orders. Also, (a kind of ) non- 
asymptotic optimality is proved for some performance measures which varies with both yield and distortion. These 
results assure us that our protocol {C™} is the best universal entanglement concentration protocol. 

Here, we stress that most of these results generalize to the case where Schmidt coefficients of an input are known 
and its Schmidt basis is unknown. 

In the end, we prove that the classical output of our protocol gives an asymptotically optimal estimate of entropy 
of entanglement. Surprisingly, this estimate is not less accurate than any other estimate based on (potentially global) 
measurement whose construction depends on the Schmidt basis of the unknown state. 

Considering its optimal performance in very strong senses, it is surprising that our protocol does not use any 
classical communication at all. 

The paper is organized as follows. After introducing symbols and terms in Section II, we describe implications 
of the permutation symmetry, and constructed the protocol {C™} (Subsection III A). Its asymptotic performance is 
analyzed using known results of group representation theoretic type theory in Subsection IIIB, followed by comparison 
with a estimation based protocol IIIC. 

Optimality of the protocol is discussed in Section IV. Subsection IV A gives definition of measures of performance 
and short description of proved assertions. The key lemma, which restricts the class of protocols of interest to large 
extent, is proved in Subsection IV B. Subsections IVC-IVG treats proof of optimality in each setting. 

The estimation theoretic application of the protocol is discussed in Section V. Interestingly, a part of arguments in 
this section gives another proof of optimality of {C™} in terms of error exponent. 

In the appendices, we demonstrated several technical lemmas and formulas. Among them, an asymptotic formula 
of average yield of non-universal entanglement concentration protocols is, so far as we know, had not shown, and 
might be useful for other applications. 

II. DEFINITIONS 

Given an entangled pure state |</>) £ Ha ®Hb (dim Ha = dimHB = d), we denote its Schmidt coefficients by 
P4, = {Pi,<t>, ■ ■ ■ ,Pd,4>) (Pi,4> > P2,<p > ■ ■ ■ > Pd,<j> > 0) and its Schmidt basis by {\ef x }, respectively. Entropy of 
entanglement of \(f>) equals the Shannon entropy H of the probability distribution p^, where Shannon entropy H is 
defined by H(p) := X)i — Pi^ogPi- (Throughout the paper, the base of log is 2.) In the paper, our main concern 
is concentration of maximal entanglement from by LOCC. We denote a maximally entangled state with the 

Schmidt rank L by 

\\L) :=4fEli?A)l^ S ), 
V L i=i 

where is an orthonormal basis in Hf n (x = A,B). Note that {|/" x )} need not to be explicitly defined, for 

the difference between l/TVI/fs) an( ^ Y^n=i \fi,jd l/fs) IS compensated by a local unitary. One can 

optimally produce ||2™ H ( p *)) from \<fi)® n by LOCC with high probability and high fidelity, if n is very large [3, 18]. 

In this paper, an entanglement concentration {C™} is a sequence of LOCC measurement, in which C™ takes n 
copies \4>)® n of unknown state as its input. With probability Qf, n (x), C n outputs p^„(x), which is meant to be an 
approximation to ||2 nx ), together with x as classical information. 

The worst-case distortion e^„ is the maximum of square of the Bure's distance between the output p^ Cn (x) and the 
target \\2 nx ), 

4„ :=l-mm(2"*||^„(x)||2"*), 

while efj n denotes the average distortion, 

X 

1-E* (2 nX \\pUx)\\2 nX ), 
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where E^L, means the average with respect to Q^„ , 

X 

A protocol is said to be distortion -free, if e^„ = e^„ = holds for all cf>. 

III. CONSTRUCTION OF THE PROTOCOL {CT} 

A. Symmetry and the protocol {C™} 

In the construction of {C™}, we exploit two kinds of symmetries. First, our input, |0)® n , is invariant by the 
reordering of copies, or the action of the permutation a in the set {1, . . . n} such that 

n n 

\hi,A)\hi, B ) l-> \K-i(i),A}\K-i(i),B), 
i=l i=l 

where G W x (x = A, £?). (Hereafter, the totality of permutations in the set {1, n} is denoted by S n .) Second, 

an action of local unitary transform U® n ®V® n (U, V € SU(rf)) corresponds to change of the Schmidt basis. 
Action of these groups induces a decomposition of the tensored space Hf n (x = A, B) [20] into 

Hf n = W„ ;X , W a , x := U n , x ® V n , x (x = A, B), (1) 

n 

where U n ^ x and V n ,x is an irreducible space of the tensor representation of SU(rf), and the representation (1) of the 
group of permutations respectively, and 

d 

n=(m,...,n d ), = n ' Ui - Ui + 1 - °' ( 2 ) 

i=l 

is called Young index, which W njK and V nj2; uniquely correspond to. In case of spin-i-system, W n , x is an eigenspace 
of the total spin operator. Due to the invariance by the permutation (1), any n-tensored state \4>)® n is decomposed 
in the following form. 



Lemma 1 

|$® n =£V^n>®|V=>, 
n 

where, \<j) a ) is a state vector in U n ,A ®Un,B, a„ * s a complex number, and \V n ) is a maximally entangled state in 
V n ,A ® V n ,B with the Schmidt rank dim V n ,A- While \<j> n ) and a£ depends on the input \<p), |V n ) does not depend on 
the input. 

Proof Write 



n,n' i,j,k,l 



U„,a\ 













V„/ 



where | e^"' A ^|, j e J"' A ^>|i | e fe"' S ^}! | e ^"' B )} * s a com Pl ete orthonormal basis in U^a, V n ,A, ^n,B, Vn,B, 
respectively. Establish a correspondence between a vector |0)® n in bipartite system and an operator 



&n,i,j,n',fc,i 
n,n' i,j,k,l 



using 'partial transpose' or the linear map which maps 
For this map is one to one, we study <E>™ in stead of \4> 



(gin 



v„, 



to 
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Observe that <&™ is invariant by action of any permutation a, 
where the action of cr is defined by (III A). Due to Lemma 14, 6 n ,i,j,n',fe,i = unless n = n', and 



Then, we apply Lemma 16 to 



proving that <J>" is of the form 



= E v.,,.,..*.' 

n i,j,k,l 



^ ^ ^n, n, fc, I 

i.j,k,l 



u n \ lu 













e u r) 











Idv„ 



- dim V„ 
V 



dimVn f-f I 3 

3 = 1 



where <£> n is a linear map in W n . To obtain the lemma, we simply take "partial transpose" of this again: apply 

For this map is one to 



e»»)(e U k » 



the linear map which maps 
one, $™ is mapped to |</>)®". By this map, 



to 



Vn, 



dim V, 



^-vdim V n 

2^7 = 1 



v„ 



is mapped to |V n ), and $ n is mapped to 

□ 



This lemma implies that there are maximally entangled states, |V n ), which are accessible without using knowledge 
on the input state. The average amount of the accessible entanglement is decided by the coefficients a n , which vary 
with the Schmidt coefficients of the input \<p). 

Now we are at the position to present our universal distortion- free entanglement concentration protocol {C™} 
(Hereafter, the projection onto a Hilbert space X is also denoted by X): First, each party apply the projection 
measurements {W^.i},!,, and {W nB! B}n B at each site independently. This yields the same measurement result 
= ub = n at both site, and the state is changed to |</> n ) ® |V n )- Taking partial trace over U^a and U n ^B at each 
site, we obtain |V n ). 

HTCa and TLb are qubit systems, {WnA.i}nx is nothing but the measurement of the total angular momentum. 

For the sake of the formalism, |V n ) is mapped to ||dimV n ). With this modification, pfj n (x) = \\2 nx )(2 nx \\ and 
Qfj n (x) = a n , if 2 nx = dim V n (if such n does not exist, Qfj n (x) = 0). 

Due to the identity Q* c „ ( logd ™ v " ) = a n = Tr{>V„,A (Tr B |</>) and the formulas in the appendix of [8], we 

can evaluate the asymptotic behavior of Qfj n (x) as follows: 



log dim V n 



/n 
H( - 

. n 



< — ^— log(n + d), 



lim — logQ^„ ( 

n— >oo n \ 

,. -1 , r,4> /log dim V, 

hm — log > Q£„ 



logdimV n \ n / n |, s 

= D(- P0J, 

n I n 



(3) 



'-EK 



maxD(q||p ), 



where 1Z is an arbitrary closed subset of {q|gi > <Z2 > • • • > qa > 0, J2i=i 1i ~ !}• These means the probability for 
i log dim V n <~ H (p) is exponentially close to unity, as is demonstrated in Subsection IIIB. 
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B. Asymptotic Performance of {C™} 

In this subsection, we analyze the asymptotic performance of {C"} in terms of success (failure) probability, total 
fidelity, and average of the log of Schmidt rank of the output maximally entangled states. (The proof of the optimality 
of {C™} is made for more general class of measures). For the main difficulty of universal concentration is attributed 
to uncertainty about Schmidt basis, we consider the value in the worst-case Schmidt basis. 

The worst-case value for the failure probability, or the probability that the yield is not more than y equals 

max J2 Qc® V *(*)- (4) 

x:x<y 

where U and V run all over unitary matrices. For the yield of our protocol {C™} is invariant by local unitary 
operations, the maximum over U and V can be removed. Due to the first and the third formula in (3), we have, 



lim —log V Q+ n (x)=D(R\\p4 > ), (5) 

n— >oo n ' * 

x:x<R. 

and 

lim -±log ]T Q*n(x)=D(R\\ P4> ) (6) 

where 



n 

x:x>R 



tVrIItVi - I min q : -ff(q)>-R D(q||p) (H(p) < R), 
D{K\\p) - I niinq:ff(q) < i? D(q||p) (H(p) > R). 



Eq. (5) implies that our protocol achieves entropy rate: if R is strictly smaller than H(p^), the RHS of Eq. (5) 
is positive, which means that the failure probability is exponentially small. On the other hand, Eq. (6) means that 
the probability to have the yield more than the optimal rate (strong converse probability) tends to vanish, and its 
convergence is exponentially fast. 

Next, we evaluate the exponent of the total fidelity F^, n (R), or the average fidelity to the maximally entangled 
state whose Schmidt rank is not smaller than 2 nR : 

f&(fl):=E* max(2"l^„PO||2^). (7) 

y.y>R 

(The optimization is considered in the worst-case Schmidt basis.) This function describes trade-off between yield and 
distortion. Obviously, Ff, n (R) is non-increasing in R, and takes larger value if the protocol is better. We evaluate 
this quantity for {C"} as follows. 



1 - F+ s (R) = 1 - ^min {l, 2~ n ^} Q+„(x) 

X 



x:x<R 



The RHS is upper-bounded by J2 x -x<r Qc n ( x ) an d lower-bounded by (l — 2 n ( R x )) Qqu(x) where x can be any 
value strictly smaller than R. Hence, if R < H(p l j > ) letting x = R — ^ such that Q^ n (x) ^ 0, using the second 
equation of (3), we have 



l im -ilog(l-i^ B (i2)) =D(i?||p^ 



The exponent of failure probability, strong converse probability, and total fidelity for the optimal non-universal 
protocol are found out in [15], and we can observe the non-zero gap between the exponents of {C"} and the optimal 
non-universal protocol. By contrast, these quantities for BBPS protocol coincides with the ones for {C™}. (Proof is 
straightforwardly done using the classical type theory) . 
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This fact may imply that the protocol {C™} is so well-designed that its performance is comparable with the one 
which uses some information about the input state. However, it might be the case that these quantities are not 
sensitive to difference in performance. Hence, we also discuss another quantity, the average yield (, evaluated at the 
worst-case Schmidt basis), 

minViQT*(i) = minE^^A. (8) 
u,v ^— ' ° u,v Qc% 

x 

The average yield of BBPS protocol is of the form 

TT . . Togn B fl\ 
n n \n J 

where the coefficients A, B and their derivation are described in Appendix B. The average yield of the protocol {C™} 
is less than that of BBPS protocol by ^, where C is calculated in Appendix C. Hence, this measure is sensitive to 
the difference in performance which do not reveal in the exponent of failure probability etc. 



C. Comparison with estimation based protocols 

Most straightforwardly, universal entanglement concentration is constructed based on the state estimation; First, 
c„ copies of \4>) are used to estimate the Schmidt basis, and second, apply BBPS protocol to the n — c n copies of \<j>). 
The average yield of such protocol cannot be better than 



^H(p) + A l0s(n - C " ) +QflV 

n n — c„ \n J 

where c„ slowly grows as n increases. Therefore, this estimation-based protocol cannot be better than {C™}, because 
the average yield of {C™} and BBPS protocol are the same except for O (—) -terms. 

One might improve the estimation-based protocol by replacing BBPS protocol with the non-asymptotically optimal 
entanglement concentration protocol. However, this improvement is not likely to be effective, because in qubit case, 

the yield of these protocols are the same up to the order of O (^p 1 ) (Appendix D, O (^)-term is also given). 

Another alternative is to use precise measurements which cause only negligible distortion, so that we can use all the 
given copies of an unknown state for entanglement concentration. This protocol can be very good, and there might 
be many other good protocols. As is proven in the next section, however, none of these protocols is no better than 
{C™}, i.e., {C™} is optimal for all protocols whose outputs are slightly distorted. 



IV. OPTIMALITY OF {CT} 
A. Measures, settings, and summary of results 

A performance of an entanglement concentration has two parts. One is amount of yield, and the other is distortion 
of the output. The measures of the latter are, as is explained in Section II, e^,„ and e^„. Hereafter, maximum of these 
quantities over all Schmidt basises ( ma,x uv e^f V ^ and max^y e^f are discussed. 

The measures of the yield (4), (8) discussed in the previous section are essentially of the form 

min^/OrOQ^Or) =minEj^/(X). (9) 

U.V * — ' U.V ^c" 

X 

So far, we had considered minimization for error probability, and maximization for average yield. Hereafter, we use 
success probability 

minE* ,©(A-i?), 

U.V Vc" 



with Q(x) denoting the step function, instead of error probability (4). From here to the end, optimization of an yield 
measure (9) means maximization of (9). 
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Namely, minimization of (4) corresponds to maximization of (9) with f(x) = Q(x — R). Also, maximization of (8) 
is equivalent to maximization of (9) with f(x) = j^-g- 
These examples are monotone and bounded, or 

f(x) > fix') > /(0) - 0, (x > x 1 > 0), (10) 

/(logrf) = 1, (11) 

and 

continuously diffcrentiable but finitely many points. (12) 

The condition (10) and (11) are assumed throughout the paper unless otherwise mentioned. 

In the following subsections, measures of the form (9) are optimized with the restriction on the worst-case distortion 
max^v e^,f v ^ or the average distortion max^.y ~e^ V ^ ■ 

Also, we consider the optimization (maximization) of the measures which vary with both yield and distortion. 
Namely, the weighted sum of these yield measure and the average distortion ej^„, i.e., 

minE^ 8V */(X)- Amaxe^ F * (13) 

and the total fidelity max^y Fq® V( ^ (R) are considered. 

We prove that the protocol is optimal, in the following senses. 

1. The entropy rate is achieved (Subsection IIIB, Eq. (5)). 

2. Non-asymptotic behavior is best among all distortion-free protocols (Subsection IV C). 

3. Higher order asymptotic behavior is best for all protocols which allows small distortions (Subsections IV D, 
IV E). 

4. In terms of weighed sum measures (13) and the total fidelity, the non-asymptotic optimality holds (Subsections 
IVF-IVG). 

The key to the proof of these assertions is Lemma 3 which will be proved in the next section. Due to this lemma, we 
can focus on the protocols which is a modification of {C™} in its classical outputs only. This fact not only simplifies 
the argument but also assures us that {C"} is a very natural protocol. 

Here, we note that many of our results in this section generalize to the case where Schmidt basis is unknown and 
Schmidt coefficients are known. Such generalization is possible if the optimization problem can be recasted only in 
terms of a family of quantum states {U ® V(f)} uv , where \(f>) is a given input state and U and V run all over SU(d). 
This is trivially the case when we optimize a function of distortion and yield. This is also the case if conditions on 
distortion are needed to be imposed only on the given input state, and not on all the possible input states. 



B. The Key Lemma 

In this subsection, we prove Lemma 3, which is the key to the arguments in the rest of the paper. To make analysis 
easier, before the protocol starts, each party applies U® n , V® n at each site, where U and V are chosen randomly 
according to Haar measure in SU(<i), and erase the memory of U, V. This operation is denoted by 01, hereafter. 

From here to the end of the paper, C™ means the composition of Ol followed by C". The optimality of the newly 
defined {C™} trivially implies the optimality of {C™} defined previously, because Ol simply randomizes output and 
cannot improve the performance; 

Ol : p -> Eu iV (U <g> V)® n p{tf ® V^)® n , 
where Ejjy denotes expectation by Haar measure in SU(d). (More explicitly, 

Eu,vf{U,V) = J f(U,V)p(dU)p(dV), 

where p is the Haar measure with the convention J //(d?7) = 1.) 
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Lemma 17 implies that U ni A®L(n,B is an irreducible space of the tensored representation U® n ®V® n of SU(d) xSU(rf). 
Hence, by virtue of Lemmas 14-15, the average state writes 

E uy (U ® V\<f>)(<f>\U* ® V*)® n = a+at (14) 

n 

and 

, «n,A®«n,B® |V n )<V„| 

ct„ := 



dim{W nj A <g> W n ,s} 

We denote by 02 the projection measurement {Wn A ,A ® yV nB ,B}n A ,n B , which maps the state p to the pair 

Wn A ,A ® W„ B „B/OWn A ,A ® W„ B ,i? S 



n A , n B , 



with probability 



Here note that, due to the form of <r^, = := n, so long as the input is many copies of a pure state. Given a pair 
(n, cr^) of classical information and a state supported on W^a ® W n ,B, the operation 03 outputs (n, tr^ n A( g,^ n B <r^) . 

Denoting the composition of an operation A followed by an operation B as Bo A, C™ writes 03o02o01, essentially. 
(The mapping from |V n ) to || dim V n ) is needed only for the sake of formality.) Here, in defining B o A, if A's output 
is a pair (n, p n ) of classical information and quantum state, we always consider the correspondence 



(n, Pn )^\n)(n\®U aPn Ul (15) 

where {|n)} is an orthonormal basis, and U x is an local isometry to appropriately defined Hilbert space. Here, 'local' 
is in terms of A-B partition. In terms of this convention, the definition of 03 rewrites 

03 : |n) (n| ® U a *tU± - |n> (n| ® Cfri^^.^Ctf , (16) 

using local isometry U n and U' n . (The domain of U n and U^is W n ,A ® VV n .B and V n .A ® V n .B, respectively.) 

Recall that all the measures listed in the previous section are invariant by local unitary operations to the input, 
i.e., the measure /„ (p, {C n }) satisfies 

f n (p,{C n }) = f n {U®V P U^ ®V\{C n }), (17) 

vuyv e su(d), 



and is affine with respect to p, 



f n (pp+(l-p)a,{C n }) (18) 

= p/n(p,{C"}) + (l-p)/„(<7,{0"}). 



Recall also that the worst-case/average distortion are affine. Hereafter, the worst-case/average distortion are always 
evaluated at the worst-case Schmidt basis, so that those measures satisfy (17). 

Lemma 2 For any given protocol {C n } 7 we can find a protocol such that; (i) The protocol is of the form {B n o C™} ; 
where B n is an LOCC operation; (ii) A performance measure satisfying (17) and (18)takes the same value as the 
protocol {C n }, 

f(p,{B n oc:}) = f( P ,{c n }). 



Proof Due to (17) and (18), the operation Ol does not decrease the measure of the performance, because 

f n {EuyU ®V P U^ ®V\{C n }) 
= ^u,v fn (U ® Vptf ® W , {C n }) 
= Eu, v f n (p,{C n })= f n (p,{C n }). 
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Hence, {C™ o Ol} is the same as {C™} in the performance. 

After the operation 01, the state is block diagonal in subspaccs {W n!J 4 ® W n .s}. Therefore, if we use the corre- 
spondence (15), the state is not unchanged by 02 (up to local isometry). More explicitly, let U n be a local isometry 
in (16), and C ,n = C n o [/t. Then, we have 

f{p, {C' n o 02 o 01}) = f(p, {C n o 01}). 

Observe also that, after the operation 01, parts of the state which are supported on IA^a ® Un,B are tensor product 
states. Hence, there is a operation B n such that 

f(p, {B n o 03 o 02 o Ol}) = f(p, {C' n o 02 o 01}), 

because tensor product states can be reproduced locally whenever they are needed. More explicitly, B n = C' n o B' n , 
where B ln is 

B' n : |n) (n| ® U' n pU'J -> |n) (n| ® [/„ (W n ,A ® Wn,B ® p) E/£, 

where U n and are local isometry in (16). 

After all,/(p, {B n o C?}) = /(p, {O™}) and the lemma is proved. □ 

In the postprocessing B n , a classical output x will be changed to x+ A with probability Q" (x + A|x), accompanying 
some SLOCC operations on the quantum output. In the following lemma, for a given Q n (y\x), Q n (y\x) is a transition 
matrix such that Q n (y\x) — Q n (y\x) for y > x and Q n (y\x) — for y < x, and Q n (x\x) := Q n (x\x) +J2 y < x Q n (v\ x )- 

Lemma 3 In optimizing (maximizing) (i)-(vi), we can restrict ourselves to the protocol satisfying (a)-(c). 
(i) (9) under the constraint on the worst- case/ average distortion 
(ii) the weighted sum (13) 
(in) Total fidelity (7). 

(a) The protocol is of the form {B n o C"}. 

(b) In B n , the corresponding Q n (y\x) satisfies Q n (y\x) = for y < x. 

(c) B n does not change quantum output of C" . 

Proof The condition (a) follows from Lemma 2, for worst-case/average distortion, (9), and total fidelity (7) because 
they satisfy (17) and (18). 

For /(x) is monotone increasing, 

^2f(y)Q n (y\^)Qin(x) <^2f(y)Q n (y\^)Qt^)^ 

x,y x,y 

where Q n (y\x) is a transition matrix such that Q n (y\x) — Q n (y\x) for y > x and Q n (y\x) = for y < x, and 
0" {x\x) :— Q n (x\x)+J2 y<x Q n (y\ x )- Hence, Q n (y\x) improves Q n (y\x) in average yield (9), while worst-case/average 
distortion is unchanged as is proved later. Therefore, (b) applies to (i) and (ii). 

To go on further, we have to find out optimal state transition made by the postprocessing. When the postprocessing 
^"changes classical output x to y, the corresponding quantum output Pg n (y\x) which minimize the distortion i.e., 
maximizes the fidelity to \\2 ny ) is 

f ||2" :E )(2" :I: || (y > x), 

P B n(y\x) = { || 2 n»)(2 n w|| (y<x), ( 19 ) 

for the reasons stated shortly. In case the y < x, LOCC can change the output of C", ||2 n:E ), to \\2 ny ) perfectly and 
dctcrministically. On the other hand, in case y > x, monotonicity of Schmidt rank by SLOCC implies that |j2 Tla: ) is 
the best approximate state to ||2" y ) in all the states which can be reached from \\2 nx ) with non-zero probability. This 
transition causes the distortion of 1 — 2~ n ( y ~ x \ _ 

From (19), it is easily understood that worst-case/average distortion of Q n (y\x) equals that of Q n (y\x), and that 
the condition (c) applies to (i) and (ii). 
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It remains to prove (b) and (c) for (iii). Observe that total fidelity (7) does not depend on the classical output of 
the protocol. Therefore, condition (b) is not restriction in optimization. Therefore, we only prove (c). By definition, 

F^(R) = J2 E Q n (y\x)Qtn(x)(2^\\p% n (y\x)\\2^). 

x.y z.z>R 

In x > R case, p% n {y\x) = \\2 nx ){2 nx \ \ achieves 

E Q n (y\*)(2 nz \\pUy\x)P nz ) = h 

V z:z>R 

which is maximal. In x < R case, for any z > R(> x), the maximum of (2 nz \\pg n (y\x)\\2 nz ) is achieved by p% n {y\x) — 
\\2 nx ){2 nx \\, because monotonicity of Schmidt rank by SLOCC implies that \\2 nx ) is the best approximate state to 
\\2 nz ) in all the states which can be reached from \\2 nx ) with non-zero probability. Therefore, the optimal output 
state should be as is described in (c). □ 

Now, the protocol of interest is very much restricted. We modify classical output of C™ according to transition 
probability Q n (y\x), while its quantum output is untouched. Note Q n (y\x) is non-zero only if y > x. Especially, 
transition to y strictly larger than x means that the protocol claims the yield y while in fact its yield is x < y. In 
other words, this is excessive claim on its yield. 

Main part of our effort in the following is how to suppress 'excessive claim', or Q n (y\x) for y > x by setting 
appropriate measure or constraint. 

Note that the mathematical treatment is much simplified now, for we only have to optimize transition probability 
Q n (y\x), with the condition that the distortion 1 — 2~ n ( v ~ x ) occurs only if y > x. 

Observe that in the proof of these lemmas, we have used the uncertainty about Schmidt basis. This assumption 
is needed to justify the condition (17). However, the uncertainty about Schmidt coefficients has played no role. 
Therefore, Lemma 2-3 holds true even in the case where Schmidt coefficients are known. 

Hereafter, maximization/minimization over local unitaries will be often removed, because the protocols of our 
interest are local unitary invariant. 



C. Distortion-free protocols 

Theorem 4 {C™} achieves the optimal (maximal) value of (9) for all universal distortion-free concentrations for all 
finite n, any input state |</>), and any threshold R. Here, f only need to be monotone increasing, and need not to be 
bounded nor continuous. 

Proof Lemma 3 apply to this case, for distortion-free condition writes, using the invariant measure of a distortion, 
max^y £p® y< ^ = 0. To increase the value of (9), Q n (x + A|x) should be non-zero for some x, A with A > 0, which 
causes non-zero distortion. Hence, it is impossible to improve the yield measure (9) by postprocessing. □ 

Observe that the proof also applies to the case where Schmidt coefficients are known, for the condition 
max^v t^ V ^ = assertion is needed to be imposed only on the input state. 



D. Constraints on the worst-case distortion 



In this subsection, we discuss the higher order asymptotic optimality of {C™} in terms of the average yield (9) 
under the constraint on the worst-case distortion, 



u®v<t> 



maxe 

u,v 

max (1 - 2- nA ) 

A:3x,Q"(x+A\x)^0 ' 



< r n < 1, 

which implies 



Q"(x + A|x)=0, A > lQg(1 — (20) 
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This means that the magnitude of the improvement in the yield is uniformly upper-bounded by ~ log (|~ rn ) . Ineq. 
(20) is the key to the rest of the argument in this subsection. 

In discussing the average yield (9), we assume f(x) is continuously differentiable at around x = H (p^). In addition, 
first we assume /' (H (p</>)) > 0. After that, we study the case where f'(x) = in the neighborhood of x =.H (p</>). A 
typical example of the former and the latter is f{x) = and f(x) = Q(x — R), respectively. 

Note that the argument in this section holds true also for the cases where Schmidt coefficients are known. This is 
because the constraint max^y e^f V< ^ < r n is needed to be imposed only on a given input state, and not on all the 
state. 

Theorem 5 Suppose that f is continuously differentiable in a region (i?i,i?2) with R\ < H (p^) < R 2 , and r n is 
smaller than 1 — S with S being a positive constant, and r n is not exponentially small. Then, 

(i) {C™} is optimal in the order which is slightly larger than O or for any protocol {C n }, 



(ii) if f (H(p^)) > 7 there is a protocol {C n } which is better than {C"} by the magnitude of O (^-) for an input 



or 



Applied to f{x) = j^^, (i) and (ii) imply that with the constraint max^y e^f y ^ — » 0, {C™} is optimal up to 

O (i)-terms, and not optimal in the order smaller than that. Hence, the coefficients computed in Appendix D are 
optimal. 

Proof (i) Obviously, the optimal protocol {C"} is given by 



Q n (x + A'\x) = l, A' 



In the region (R 1 ,R 2 ), with c := max x:fll < x <ij 2 / (x), 



log(l - r„) 



f(x + A') < f(x)+cA' 

< f(x)+c ^-^ 



(21) 



holds. For the function — log(l — x) is monotone and concave, if r n < 1 — 6, we have 

- log(l - 1 + 5) + log(l - 0) 



f(x + A') < f(x) 



< f(x) + c- 



1-5 



log(l - S) r n 
l-S n 



The average of the both sides of this over x yields 



-nD\ 



E* f{X)<7& f{X) + c l ° f\ 6) ^ + 0(2 

v c « v C ri l—o n 

3D > , 

for the sum over the complement of (Ri, R 2 ) is exponentially small due to the third equation of (3). This implies the 
optimality of our protocol. 

(ii) Due to mean value theorem, 



f(x + 



•log(l - r n ) 



> f(x)+c' - l0g{1 - rn) , 3c' >0, 



> f( x ) + (-c'log6)^, 
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holds in a neighborhood of x — H (p^). Hence, letting {C™} be the protocol corresponding to (21), we have 

f(X)>E^ f(X) + (-c'logS) r -^-0(2- nD ), 

3D > 0, 

proving the achicvability. □ 

In case of f(x) = 0(x — R) 1 which is flat at around x = H (p^), (ii) of this theorem does not apply, and as is shown 
below, the upper-bound to the average yield suggested by (i) is not tight at all. 

Theorem 6 Suppose f (x) = (i?i < x < R 2 ), f(Ri - 0) + f (H (p )) ; f(R 2 + 0) ^ / (H (p^)), and ImT^^ r n < 1. 
//H(p )>i? 1; 



lim V lo g E {/(H(p*)) -/(*)} 

x:a;<i?i 

< D (i?i||p ) , 



/lo/tfe. 7/H(p^) < i? 2 , 



Jim — log ^ {/(aO-/(H(p*))}Q& n (aO 



1— 'OG ^ .... 



> D(H 2 ||p ), 

and i/ie equality is achieved by {C™}. 

This theorem intuitively means that, if f(x) is flat at the neighborhood of x = H (p^), for the optimal protocol, 
the quantity (9) is approximately of the form, 

/ (H (p )) - A2^ nU{Rl ^ p ^ + J B2 _rlD(i?2||p * ) . 

Applied to f(x) = Q(x — R), the theorem implies the optimality of (5) and (6) under the constraint 
lim„^oo max[jy e^?^ < 1. 

Proof Suppose H (p^) > Ri . For any R < R\, 

E {/(H(P*)) -/»}<&.(*) 

x:x<Ri 

> {/(h(p*)) -/»}<&>(*) 

> {/(H(p ))-/(i?)} ^ £#„(*), 

where the second inequality is due to monotonicity of /. On the other hand, (20) implies 

E«c»w> E qUw- 

x:x<R x:x<R- - los( ^ rn) 

Combination of these inequalities with (5) leads to 

^ k)g J2 {/(H(p ))-/(x)}Q^( a; ) 

a;::r<i£i 



^ -1 f logE x:x < fl _ -'°s( ) i-m) Q£» (*) 
\ +log{/(H(p,))-/(i?)} 



< lim 

n^oo 77. 



< n(R\\ P4> ), 
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which, letting R — > R\, leads to the first inequality. On the other hand, in H (p^) < i? 2 case, the monotonicity of / 
and (20) also implies 

]T {f(x)-fQHp4,))}Q+ n (x) 

x:x>R.2 

< {/(logd)-J(H(p,))} Oc-W 

x:x>R2 

< {/(logd)-J(H(p,))} J2 

Combination of this with (6) leads to the second inequality. 

The achievability is proved as follows. Suppose H (p^) > For x smaller than Ri, 

f(K(pt))-f(x) < /(H( P0 )) 

= /(H( P0 ))(l-6( a; -i? 1 )). 

Hence, the exponent is lower bounded by 

-1 



lim — logE;L {l-QiX-Rt)} 

n^oo n ^C™ 

+ lim — log/(H(p*)) 

n^oo n 

= D(i2i||p ), 

which means the first inequality is achieved. Suppose H (p^) < i?2- For x larger than i?2, then we have 

/(s)-/(H(p,)) 
> {/( J R )-/(H(p^))}e(a ; -i?o), 

where Rq is an arbitrary constant with R > R 2 . Hence, the exponent is upper-bounded by 

lim — logE^ G(x-Rq) 
+ Hm" — log{/(H(p^))-/(i2o)} 

n^oo 77, 

= B(R \\ P4> ). 

Letting R — > i?i , we have the achievability of the second inequality. □ 



E. Constraints on the average distortion 

In this subsection, we discuss the higher-order asymptotic optimality of {C™} in terms of the generalized average 
yield (9) under the constraint on the average distortion, 

maxe^?^ 
u,v ° 

= ^T(l-2-" A )E^Q"(X + A|X) 

A 

< r n < 1. 

Denote the probability that the improvement by the amount A occurs by 



Pr(A):=E^ Q n (X + A\X). 
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Observe that 

4« = £(l-2- A )Pr(A) 

A 

> (l-2- c )Pr(A> -\ 
v '4, I nJ 

which implies 

Pr(A>-j< r " . (22) 
I ~ nJ ~ 1 - 2~ c V ' 

Hence, the magnitude of improvement is upper-bounded only in average sense, in contrast with (20) which implies an 
upper-bound uniform with respect to x. 

Suppose that / is continuously differentiablc all over the region (0, \ogd). Then, the improvement x — > x + A causes 
the distortion by the amount of 

1 — rt~ n 

1_ 2 -" A > - — —A 
logd 

1 — d~ n 1 

logd c 

where, c = max x: o< x <i /' {x) ■ Taking average of the both side, 

^^K E ^ /m - E V m )' 



or, 



On the other hand, let the protocol {C™} be the one corresponding to 

Q n (\ogd\x) = r n , Vx. 



Then, we have 



E* f(X) = E* f(X) + r n EX {1 - f(X)} 

Wc n ^cv- ^cv- 



+r n {1 - /(H(p ) + c)} (l - 2-" d ( h (p*)+ c Hp*)) , 

Vc > 0, 

while the average distortion of {C"} is at most r n . 

Now, we extend these arguments to the case where finitely many discontinuous points exist. First, in the proof 
of the upper-bound, it is sufficient for / to be continuously differentiablc in the neighborhood of x — H (p^,), if the 
exponentially small terms are neglected. Second, the evaluation of the performance of the protocol constructed above 
does not rely on the differentiability of /. Therefore, we have the following theorem. 

Theorem 7 (i) Suppose that f is continuously differentiable in the neighborhood of x = H(p^,). Suppose also r n 
is not exponentially small. Then, ife^ n < r n ., {C"} is optimal in terms of (9), up to the order which is slightly 
larger than 0(r n ), or for any protocol {C n }, 



E^J(X)<E^J(X) + 0(r n ). 



(ii) Suppose that /(H(p^,) + c) < 1, 3c > 0, and Cq u < r n . Then, there is a protocol {C n } which improves {C"} by 
the order of 0{r n ); 

E^J(X)>E^J(X) + 0(r n ). 
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Let us compare this theorem with Theorem 5 which states optimality results with constraint on worst-case distortion. 
First, the yield is worse by the order of i. In particular, if f(x) = j^j, r n needs to be o(-^^) for optimality up to a 
higher order term is guaranteed. By contrast, under the constraint on the worst-case distortion, r„ = o(l) is enough 
to certify optimality up to the third leading term. 

Second, applied to the case of f(x) = Q(x — R), H(p^) > R , Theorem 7 implies the following. With r n = o(l), the 
success probability J2 x . x>R Qc^ ( x ) vanishes (strong converse holds), but the speed of convergence is at most as fast 
as r n , which is not exponentially fast, in general. Therefore, (6) is far from optimal unless r n decreases exponentially 
fast. By contrast, under the constraint on the worst-case distortion, a constant upper-bound is enough to guarantee 
the optimality of the exponent (6). 

Let us study the equivalence of Theorem 6, because Theorem 7, (ii) cannot be applied to discussion of optimality 
of the exponent (5), in which the rate R is typically less than H(p^). 



Lemma 8 Suppose that 



r > lim maxfr, 

n->oo u.v ° 



u®v<t> 



holds for all \4>) . Then, for all \<f>), all c > 0, all S > 0, and all R' , R" with R 1 > R" , there is a sequence {x n } such 
that R" < x n < R' and 

Em" E Q n (xn + A\x n )<-^- c 

n^oc 1 — 2 L 

hold. 

Proof Assume the lemma is false, i.e., there is a sequence {nk} such that for all x in the interval (R" , R'), 

E Q nk (x + A\x)>^-^- c 

A:A>c 

holds. Choosing \<j>) with R! > H (p^) > R", we have 



Pr (A > —X 

<t> I n k J 

> E E Q nh (x + *\x)Q+2 h {x) 



> 



x:R"<x<R' A:A>-^- 
— — — n k 

r + S (i _ 2 -n fc (min{D(iJ'||p^,),D(ii'||p^)}-5')\ 

1 - 2-° \ J ' 

W > 0, 3ki, Vfc > fci, 



which, combined with (22), implies 

r + 5 ( 1 _ 2-^k(min{D(R'\\p <p ),D(R"\\p 4 ,)}-S')\ < _ 

1 - 2- c V J ~ 1 



2- c 

This cannot hold when n k is large enough. Therefore, the lemma has to be true. □ 



Theorem 9 Suppose that f(x) = 1 for x > R, and H(p^) > R holds. Then, i/lim^oo maxj/.v < 1 holds for 

all \4>), 

lim" — logE^, {l-f(X)} <D(R\\ P4> ), (23) 

n— >oo ri ^c n 

and the equality is achieved by {C™}. 

Note the premise of the theorem is the negation of the premise of Theorem 7 , (ii) . Note also that the constraint 
on the average distortion is very moderate, allowing constant distortion. 
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Proof First, wc prove (23) for f(x) = <d(x — R). Without loss of generality, we can assume that Q n (y\x) is 
non-zero only if y = R and x < R. Therefore 

1-EJ, b {/(X)} 



= \-Y,Q n {R\x)Qtn{x) 

x<R 

= l-^Q n (R\x)Q*„(x) 

X 

= J2(l-Q n (R\x))Qtn(x) 



Let R', R" be real numbers with R' < R" < R, and {x n } be a sequence given by Lemma 8. Then we have, due to 
Lemma 8, 



1-E* {f(X)} 



x 

C™ 

> {l-Q n (R\x n )}Q%»(x r . 
( r 4- S 

> 



1 _ 2 -n(R-R') ) ®% ' 



which implies 



3n Vn > no, 



l im ^i og V{l-e(x-i?)}Q^(x) 



< lim 



n 

a; 

-if log Q^(i?") 



™ n | + io g (l- T -^_ 7 _) 
= D(i?"||p ). 



For this holds true for all R" < R' < R, the limit R" -> i? leads to the inequality (23) for /(a;) = Q (x — R). 

As for / which satisfies the premise of the theorem, we lower-bound 1 — f(x) by (1 — f(Ro))(l — Q(x — Ro)), with 
i?o < R- Then, the exponent is 



1 

n — >-oo ft 



lim — logEj^{l-/(X)} 



< i im _i log E ^ (\-Q(X-Ro)) 

-co Tl Wc n 

+ BE — log(l-/(i? )) 

= D(i? o ||P0)- 

The limit R a — > i? leads to the inequality of the theorem. The achievability is proven in the same way as the proof 
of Theorem 5. □ 

Note the arguments in the proof of Theorem 7 apply also to the case where Schmidt coefficients are known. This 
is because the constraint maxyy e^®^ < r n is required only for a given input state, and not for all the state. 
On the contrary, the proof of Theorem 9 is valid only if the constraint maxyy * < r n is assumed for all \<j>) 
(otherwise, Lemma 8 cannot be proved), meaning the generalization to the case where Schmidt coefficients are known 
is impossible. 



F. Weighted sum of the distortion and the yield 

In this subsection, we discuss the weighted sum (13) of the distortion and the yield. First, we study the case where 
/ is continuously differentiable over the domain, and then study the case where / is an arbitrary monotone non- 
decreasing, bounded function. In this section, we prove (a sort of) non-asymptotic optimality. Finally, we apply our 
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result to induce another proof of Theorem 7. The argument in this subsection generalize to the case where Schmidt 
coefficients are known, as is explained toward the end of Subsection IV A. 

To have a reasonable result, the wight A cannot be too small nor too large. In this subsection, we assume 

A > 1, (24) 

because otherwise the yield / (x) can take the value larger than the maximum value of the distortion, which equals 
unity. No explicit upper-bound to A is assumed, but A is regarded as a constant only slightly larger than 1. 

Due to Lemma 3, the difference between the value of the measure (13) of {C™} and the protocol characterized by 
QT (x\y) is 



J2{f(x)-f(y)-\(l-2-^-rt)} 



x,y: 
x>y 



xQ n (x\y)Q^(x). (25) 

For {C™}, or Q n (x\y) = to be optimal, the coefficient for Q n (x\y) has to be non-positive, or, 

/ (*) -/(»)< A (l-2-"(*-»>). 

When n is large enough, the RHS of this approximately equals A9 (x — y). Hence, this inequality holds, if n is larger 
than some threshold no, for varieties of /'s. More rigorously, the condition for the optimality of {C™} writes 

n >^L lo j 1 -m^iM), (26) 



x — y \ A 

Vx, y, < y < x < log d. 

Observe that — log(l — x) is convex and monotone increasing, and — log(l — 0) = 0. Hence, the RHS of (26) is 
upper-bounded by 



1 fix) - f(y) 



7(logd)-/(0) x-y 
i\ f(x)-f(y) 



= - iog (H) 



x-y 

If the function / is continuously diffcrcntiable, the last side of the equation equals 



- log f 1 — v ] max f (x) 

b \ XJ x:0<x<logd J ^ ' 



After all, we have the following theorem. 



Theorem 10 If f satisfies (10)- (12), {C™} is optimal, i,e., achieves maximum of (13) with the weight (24) for any 
input state, any n larger than the threshold no, where 



n 



= — log (1 — v) max fix). 

b V \) x:Q<x<\ 0il d J ^ ' 



Some comments on the theorem are in order. First, this assertion is different from so called asymptotic optimality, 
in which the higher order terms are neglected. On the contrary, our assertion is more like non-asymptotic arguments, 
for we have proved that {C™} is optimal up to arbitrary order if n is larger than some finite threshold. 

Second, the factor — log (l — j) is relatively small even if A is very close to 1. For example, for A = 1.001, 
— log (l — i) = 9.96 • • • . Hence, the threshold value is not so large. For example, if f(x) = x, n = log (l — ^) = 
^96_ 1<10 

log a — 
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So far, A has been a constant, but, let A = 1 _\- n , so that the range of / and the distortion coincide with each 
other. (Note is only slightly larger than 1.) Then, if f(x) = j^d, the condition for the optimality of {C™} 



writes, 



n > 



-log(l-i^) 



loed 



which holds for all n > 1, implying the non-asymptotic optimality. 

Now, let us study the case where / is not differentiable, such as f(x) = 0(x — R). In such case, we see the condition 
(26) as a restriction on A := x — y such that Q n (y + A|y) for the optimal protocol takes non-zero value for some y. 
Observe that f(x) — f(y) < 1 is true for all the function satisfying (11). Therefore, for 

A>zMiid). 

n 

Q n (y + A|y)'s for the optimal protocol vanish for all y. Hence, the improvement of {C™} is possible only in the very 
small range of A, when n is very large. 

Those analysis of the weighted sum measures can be applied to the proof of Theorem 7, (i). For this purpose, we 
use a Lagrangian such that 



£: = mm~E* u <g,v4,f(X) + Amax ( r„ — ' 
u.v Q c « u.v V 



x?w<A 



•» I. A • / ' 

with A > 1, r n > 0. Observe that (9) cannot be larger than £ under the condition max^.y e^f 1 ^ < r n . For under 
this condition, the second term is positive, and the first term of £ is nothing but (9). In case that / is differentiable, 
Theorem 10 implies that the maximum of £ (A > 1) is achieved by {C™}. Therefore, for e^.„ = 0, we obtain the 
inequality, 

max (9) < (9) for C™ + \r n . (27) 

The similar argument applies to the case where / is continuously differentiable only in the neighbor of x — H(p0), 
except for the exponentially small terms. 



G. Total fidelity Fg n (R) 

This measure equals fidelity between an output and a target, and relation between R and this quantity reflects the 
trade-off between yield and distortion. Note that the argument in this subsection also generalizes to the case where 
the Schmidt coefficients are known, as is explained toward the end of Subsection IV A. 

Theorem 11 {C™} achieves the optimal (maximum) value of total fidelity (7) in all the protocols, for any n, any 
input state \<p), and any threshold R. 

Proof Due to Lemma 3, the protocol of interest is postprocessing of {C™} which does not touch its quantum 
output. The total fidelity of such protocol equals that of {C™}, for total fidelity (7) is not related to a classical output 
of the protocol. □ 



V. UNIVERSAL CONCENTRATION AS AN ESTIMATE OF ENTANGLEMENT 

In this section, universal concentration is related to statistical estimation of entanglement measure H(p0). 
Observe that 

He- := — log(dim. of max. cnt.) 

n 

is a natural estimate of H(p^) when Letting f(x) = Q(x — R), Theorem 7, (i) implies that the probability 

for Hc» > H(p^) tends to vanish, if lim„^oo e^„ < 1, as is demonstrated right after the statement of the theorem. 
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Therefore, if {C™} achieves the entropy rate, the estimate He"™ converges to H(p^) in probability as n — > oo (a 

consistent estimate). Especially, for the estimate Hc« which is based on {C™}, the error exponent is given using (5) 
and (6) as, 



lim — -logmax Pr { \Hn" — H(Pd>) < $\ 
i— oo n u,v v®v<t> LI* J 



n^oo n U,V Ufg>V<£ 

= min D(q||p^) 

|H(q)-H(p^)|>«5 

Now, we prove that this exponent is better than any other consistent estimate which potentially uses global measure- 
ments, if the Schmidt basis is unknown. 



Theorem 12 



lim — -logmax Pr {rl„ < Hfp^) — s\ 

n^oo n U,V UigiVcA I J 



n u,v u»V0 

< min D(q||p ) (28) 

H(q)<H(p*)-« 



lim — -logmax Pr |rl„ > Hfp^) + s\ 

n^oo n U,V U(»V0 I v J 



H(q)>H(p^,)+(5 

holds for any consistent estimate H„ o/H(p^,) by global measurement, if the Schmidt basis is unknown. 

Proof An argument almost parallel to the one in Subsection IV B implies that we can restrict ourselves to the 
estimate which is computed from the classical output of C™. 

From here, we use the argument almost parallel with the one in [16]. For e, \<j)}, with H(p^) < H(p^) — S, 
consistency of H„ implies 

Pr {H„<H(p )-(5}(:=p„)^O, 

Pr^ {g„ < H(p ) - (5} ( := q n ) - 1. (30) 
On the other hand, monotonicity of relative entropy implies 

D(Q£ B ||Q&„) > D(Pr v ,{H„}||Pr {H„}) 



> q n log h (1- <7„)log- 



Pn l-Pn 



or, equivalently, 



— logp« 
n 



< _L (d(Q£„ ||Q&„) + h(q n ) + (1 - q n ) log(l - p n )) 
nq n \ * / 



with h(x) := —xlogx — (1 — x) log(l — x). With the help of Eqs. (30), letting n — > oo of the both sides of this 
inequality. we obtain Bahadur- type inequality [2], 

LHSof (28) < lim" -D(Qt n \\Qt n ), (31) 

whose RHS equals D(p^,||p0), as in Appendix E. Therefore, choosing such that H(p^) is infinitely close to R, 
(28) is proved. (29) is proved almost in the same way. □ 

Proof of (23) with f(x) = Q (x - R), R > H(p ) 



20 



If a protocol {C™} satisfies lirrin^oo e^„ < 1 and achieves the rate of the entropy of entanglement, as is mentioned at 
the beginning of this section, the corresponding estimate He™ is consistent, and satisfies Ineq. (28). This is equivalent 
to the optimality (23) with f(x) — 9 (x — R), R > H(p0), for the error probability of Hc« equals that of C n . □ 

Suppose in addition that the Schmidt basis is known, and we discuss the first main term of the mean square error, 

E (H„-H(p^)) 2 = iv^ + (- 
or, 

V := lim nE (H„ -H(p )) 2 . 

Theorem 13 Suppose the Schmidt basis of \<f>) is known and its Schmidt coefficient is unknown. Then, any global 
measurement satisfies, 



V > ^PteQogPM - H(p )) 2 , (32) 
i=i 

i/E0(H„) — > H(p^) (n —* oo) for all \<j>), and the estimate Hcj based on {C™} achieves the equality. 



Proof Consider a family of state vectors | J] \/P^i\ e i,A)\ e j,B) j 1 , where {\ej.A)\ej.B)} is fixed and p^, runs over all 

the probability distributions supported on {1, ... , d}. Due to Theorem 5 in [13], the asymptotically optimal estimate 
of H(p^) is a function of data result from the projection measurement {\cj,A)\ej,B)} on each copies. Therefore, the 
problem reduces to the optimal estimate of H(p^) from the data generated from probability distribution p^. 

Due to asymptotic Cramer-Rao inequality of classical statistics, the asymptotic mean square error of such estimate 
is lower-bounded by, 

where J is the Fisher information matrix of the totality of probability distributions supported on {1, • • • ,d}. For 
^j-iy,j — p. ^fi. . _ pi ^pj ^, we obtain the lower-bound (32). 

To prove the achievability, observe that the Schmidt coefficient is exactly the spectrum of the reduced density 
matrix. As is discussed in [11, 14],, the optimal measurement is the projectors {W nj A ® W n .s} , which is used in the 
protocol {C™}, and the estimated of the spectrum is ^. It had been shown that the asymptotic mean square error 
matrix of this estimate equals \ J~ X + o (^). Hence, if we estimate H(p^) by H(^), we can achieve the lower bound, 
as is easily checked by using Taylor's expansion. Now, due to (3), our estimate log d ™ v " differs from H(^) at most by 

O (j^ 1 ^ ■ Therefore, their mean square error differs at most by O (j^ 1 ^ = o (^) . As a result, the estimate based 
on {C"} achieves the lower-bound (32). □ 



VI. CONCLUSIONS AND DISCUSSIONS 

We have proposed a new protocol of entanglement concentration {C™}, which has the following properties. 

1. The input state are many copies of unknown pure states. 

2. The output is the exact maximally entangled state, and its Schmidt rank. 

3. Its performance is probabilistic, and entropy rate is asymptotically achieved. 

4. Any protocol is no better than a protocol given by modification of the protocol {C"} in its classical output only. 

5. The protocol is optimal up to higher orders or non-asymptotically, depending on measures. 
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6. No classical communication is needed. 

7. The classical output gives the estimate of the entropy of entanglement with minimum asymptotic error, where 
minimum is taken over all the global measurements. 

The key to the optimality arguments is Lemma 3, which imply 4 in above, and drastically simplified the arguments. 
As is pointed out throughout the paper, almost all the statement of optimality, except for Theorem 9, generalizes to 
the case where the Schmidt coefficients are known. 

As a measure of the distortion, we considered the worst-case distortion and the average distortion. Trivially, the 
latter constraint is stronger, and thus the proof for optimality was technically much simpler and results are stronger. 
A problem is which one is more natural. This is very subtle problem, but we think that the constraints on the average 
distortion is too generous. The reason is as follows. Under this constraint, the strong converse probability decreases 
very slowly (Theorem 7, (ii) ). It is easy to generalize this statement to non-universal entanglement concentration. 
This is in sharp contrast with the fact that strong converse probability converges exponentially fast in many other 
information theoretic problems. 

Toward the end of Section IV F, using the linear programming approach, we gave another proof of Theorem 7. The 
similar proof of Theorem 9 is possible using the Lagrangian 

£': = minE* ^/(X) - A max (e^f v ^ - r n ) , 

with H (p^,) slightly smaller than R. In addition, using the Lagrangian 

£": = 

x \ 1 

^ E Qgr* j +X n x (r n - (2 nX \\pVf v *(X)\\2 nX )) j ' 

one can give another proof of the optimality results with the constraint on worst-case distortion. It had been pointed 
out by many authors that the theory of linear programming, especially the duality theorem, supplies strong mathe- 
matical tool to obtain an upper/lower-bound. Our case is one of such examples. 

Almost parallel with the arguments in this paper, we can prove the optimality of BBPS protocol for all the protocols 
which do not use information about phases of Schmidt basis and Schmidt coefficients. For that, we just have to replace 
average over all the local unitary in our arguments with the one over the phases. This average kills all the coherence 
between typical subspaces, changing the state to the direct sum of the maximally entangled states, and we obtain an 
equivalence of our key lemma. Rest of the arguments are also parallel, for an equivalence of (3) holds due to type 
theoretic arguments. 

In the paper, we discussed universal entanglement concentration only, but the importance of universal entanglement 
distillation is obvious. This topic is already studied by some authors [4, 17], but optimality of their protocol, etc. are 
left for the future study. 

Another possible future direction is to explore new applications of the measurement used in our protocol. This 
measurement had already been applied to the estimation of the spectrum [11, 14], and the universal data compres- 
sion [8, 10]. In addition, after the appearance of the first draft [9] of this paper, the polynomial size circuit for 
this measurement had been proposed [1], meaning that this measurements can be realized efficiently by forthcoming 
quantum computers. 
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APPENDIX A: GROUP REPRESENTATION THEORY 

Lemma 14 Let U g and U' g be an irreducible representation ofG on the finite- dimensional space H. and TL' , respectively. 
We further assume that U g and U' g are not equivalent. If a linear operator A in'H®'H' is invariant by the transform 

A^U g ® U g AU* if* for any g, HAH 1 = 0. [5] 

Lemma 15 (Shur's lemma [5]) Let U g be as defined in lemma 14- If a linear map A inH is invariant by the transform 
A -» UgAU* for any g, A = cId H . 

Lemma 16 Let U g be an irreducible representation of G on the finite dimensional space H, and let A be an linear 
map in K,®TL. If A is invariant by the transform A ^> I ® U g AI (g> U* for any g, A is the form of A' <g> Id^, with a 
linear map in K. 

Proof Write A = J2i j Ai ® Bj. Due to Shur's lemma, Bj — Cjldu- Therefore, 



A = J2 A i® Cjldn = c i A * ® Id «' 



and we have the lemma. □ 

Lemma 17 If the representation U g { U' h , resp.) of G(H , resp.) on the finite- dimensional space TL{TC , resp.) is irre- 
ducible, the representation U g x U' h of the group G x H in the space H <g> H' is also irreducible. 

Proof Assume that if the representation U g x U' h is reducible, i. e., H®H' has an irreducible subspace JC. Denoting 
Haar measure in G and H by n{dg) and v(dh) respectively, Shur's lemma yields 



Ju g ® U' h \4>)(4>\U* g ® U' h y(dg)u(dh) = 6Ld H9 H>, 
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for the RHS is invariant by both U g ■ U* and U' h ■ U h *. This equation leads to J \(ip\U g <g> U' h \</))\ 2 fj,(dg)v(dh) — c and 
cdimHdimH' = ^{G)v{H){4>\(p) . Choosing \ip) from JC , the former equation gives c = 0, which contradicts with 
the latter. □ 



APPENDIX B: ASYMPTOTIC YIELD OF BBPS PROTOCOL 



From here to the end of the paper, we use the following notation. 

n! : = l[n t \, 

i 

Pi ■ = Pi, P= {?!,■■■ ,Pd) 

In this section, we compute the yield of BBPS protocol, 



E r 



i , n) 
-log — 
n n! 



up to O (i). Here, E p denotes average in terms of probability distribution 



n = (m, • • • ,n d ) ~ Y[p* ■ 



i=l 



Below, we assume pi ^ 0. Due to Stirling's formula n! = \/2~jmn n e n (l + O (^)), 



= »0 



n\ d — 1 log n 
n/ 2 n 



--^log2. + -glog-j + i ?1 („), 
where i?i (n) = ^max ji, • • • , ^ • Consider a Taylor's expansion, 

-h<->+£^(*-*: 



+ 



loge 



d 2 



~[ Pj n ' 11 



and let R (s,p) := ij^n) + i? 2 (s, p). For E p / (^) - /fe) + o(l) and = n(n - l)pj + n Pj , we have, 



1 , n\ 
-log — 
n n! 



= H(p)- 



d — 1 log n 



--|^log2 7 r e+ -glogp l | 
+E p i?(",p) +o(n- 1 ) 
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For 



E p i?(-,p) ^(n- 1 ), 



(Bl) 



holds as is proved below, our calculation is complete. 

For ^logjjj is bounded by constant, R (^,p) is bounded by a polynomial function of n. Hence, due to the type 
theory, 



P 



< S 



E P R\ 

< E F 

+poly (n)2- nD< - s \ 

where D(S) := min qe{q: || q _ p || <(5} D (q| |p). In the region {n : ||s - p|| < 5} , 



n 



\Ri = O 



1 



( "-P 



< > max 

fripo:||po-p||«5 



d 3 H( Po ) 



dp l dp J dp k 



s 3 , 



Observe also D (5) = O (S 2 ) . Hence, if S = n~i, \R 2 (^,p)| = o^- 1 ) in the region {n : || = - p|| < 5}, and 2-"' D ( <5 ) = 
2-°(" 2/8 ), implying (Bl). 

APPENDIX C: DIFFERENCE BETWEEN THE AVERAGE YIELD OF BBPS PROTOCOL AND {C™} 
dimVn and are explicitly given as follows. 



{n + S)\ 
dim V n 



[[ (m - rij 



i,j:i>j 



n ^ -pj) 

l,j:i>j 



where 



(CI) 
(C2) 



6 : = (d- l,d- 2,--- ,0), 
7r(«S) : = • • ,^(d)) • 

Below, 7^ P?, and pf 7^ are assumed for simplicity. The average yield equals 



iVa£log(dimV n ) 
n L — ' 

n 

= - — pi— r V Y sgnC^o) 

n n (Pi-Pj),^ 



xEI]^ ,i)+{ * 0( " 

n i 



(dimV n )log(dimV n ), 



where n is summed over the region satisfying (2). In the sum over tto, we first compute the term for 7r = id (other 
terms will turn out to be exponentially small): 



E J[pT +S * (dim V n ) log(dim V„) 

11 i 

S„ i n i 

E^wn^'En^^ 



xlogJ sgn(-7r')— — 



where n" is defined by n" := n + 6 — n(S). For the probability sharply concentrates at the neighborhood of 
we have, 



EIR£/ (t )=/(p)+o(2-»). 

The main part of (C3) rewrites, 



I f log^TT 



Ell^^i +log E sgn(^ 



(n*+,r(«)-7r'(«))l 

ii i \ 7T tin ' 

The first term is exponentially close to n times the average yield of BBPS protocol. The second term is, 

7.1 - , n»! 



En^^yiog E s^ T^T^rk 



2^ 11 Pi ^ 



n"! ° v / (n' r +7r(5)-7r / («))! 

x 



n n K-i+i) 

log ^ sgn(Tr) 

T ' e5 - n n 

(i) >0 j=l 

,5, 

= log E sgn(Tr') 1 ' ' (,) "' (,) + (1) 

^'es„ n P 4 

»:<5„( 4 )-(5„/ w >0 

= log E sgn( 7 r')n^' (l, ^ <1> +°( 1 ) 

= log n (p l -Pj)\\pi Kik) +o(i), 

i,j:i>j k 



where the second equation is due to (C4). 
To sum up, the term for n = id is 
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n (pi-pj) 

i,j:i>j 



i j log <! n fe-p,)rW- (fc) 

+0(1) 

+average yield of BBPS 
sgn(7r)]J 2 j^ ( ' ) 




i,j:i>j 



r(k) 



xiog<{ n (Pi-pj)iiPk' 

J,j:i>j k 

+average yield of BBPS + o I — 

n 



The terms for ttq ^id are of the form 



Observe that the probability distribution 1]?"^^ is concentrated around n" = nir 1 (p), which is not close to the 

i 

region where n" takes its value. Hence, for f(x) is bounded by a constant, due to the large deviation principles, this 
sum should be exponentially small. 



APPENDIX D: ASYMPTOTIC AVERAGE YIELD OF THE OPTIMAL NON-UNIVERSAL 

CONCENTRATION 

In this section, we discuss the asymptotic performance of a optimal non-universal entanglement concentration, or 
an optimal entanglement concentration for known input state. In terms of error probability, intensive research is done 
by [15]. Here, our concern is average yield of the optimal protocol. 

For that, we use Hardy's formula [6] : the average of number of bell pairs concentrated from known pure state is. 



Y{ai-on+i)Ti\ogT u (Dl) 

i=0 

where on is a Schmidt coefficient (in decreasing order, with the convention a m+ i = 0), and Tj is the number of 
Schmidt basis such that its corresponding Schmidt coefficient is larger than or equal to on. 

In this appendix, we evaluate (Dl) asymptotically for qubit systems, assuming that an input state is \<p) = yj~q |0) + 
•y/p |1) with p > q. Below, we always neglect o(l) terms, unless otherwise mentioned, because this quantity is the yield 
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multiplied by n. 



n-l 

' n 



j=o j=o fc=0 

n— 1 i / \ i / 

n 
k 



V y / i=0 j=0 vy/ fc=0 

n— 1 n / \ i 

\ n-l / \ n-j , N i , ; 

1 " «) 0) £ (?) 108 § CD 



q) \i J \k, 

* ' j=0 i=j KJ ' k=Q v y 

n-l / \ n-j , x i j+l 

E^r)E(-) lo §E 

n-j / x Z j+i 

For X \ ) 1°S X) (fc) i s a ^ most polynomial order, due to the large deviation principles, the range of j can be 

1=0 ^ q ' / k=0 

replaced by [n(p—6), n(p + S)]. In this region,. j + Zis the order of n. Also, the range of I can be replaced by [1, n 1 / 2 ), 
for 

n-j / x i j+i 

' n 



E ! *E 



Z=ra/2 v 7 fc=0 x 



< ( - ) x n x poly(n) = o(l). 

np' np'—n^-'^ 

E fi) E C) 
Hence, we evaluate fc ,°„ ^ with < p' < |. First, we upper-bound — fc ,°„ t by 

\np' / \np' ) 



l_2n/i(p') - _3_2"'»(p') 



fe=0 



< - (n + 1) 2 2-°(« 1/3 ) 

- 1 Onhfo'l - 1J ^ 



n+l n+1 

meaning this part is negligible. Hence, we have 

np' n 1 ' 3 
S (fe) ^2 (np' — fc) 



fc=0 fe=0 



/ n \ ( n\ 

\np' J \np' ) 

1/3 

= (V)!(V)! 

(nrJ 



k=0 

Hence, the average yield is, 



(np' - k)\ (nq' + k)\ 

« 1/3 k ' u> ■ 
= 2 | \^ TT np ~ + 



('-;)s(;)'?^c){^*fe}- 



The second term of this is evaluated by using the following identity, 

- - i 



i 1 ?) E (?) / (* + l i n ) = /(*) + °(!)' ( D3 ) 

\ z=o x< '' 
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where / is continuous and bounded by a polynomial function. This identity holds true because of the upper-bound 
to the RHS, 



H)£(f) 

< max / (x + y) 

y.ye{0,n-V2] 



max / (x + y) 



and the lower-bound the RHS, 



/ \ ™ 1/2 / \ ; 

( 1 -P)y(P) m ax f(x + y) 



Hence, the second term of (D2), or 



max f (x + y) . 

qj / ! /: ! /e[0,r l - 1 /2] 



Qjfa\«J l-2(p + //n) 



equals, due to Eq. (D3), log ^ + o(l). 



The first term of (D2) is evaluated as follows. Due to Stirling's formula and Taylor's expansion, 



= nh(p) - (logp + loge) ( j + I - np) 

- (\ogq + loge) (n — j — I — nq) 

loge / (j + I) 2 (n-j-l) 2 \ n 

+ ~7T- \-n +nR 2 {-,p) 

2 \ pn qn ) n 

logn 



i (log 2^ + log J -tl + log ( 1 - J -±L ) ) + niZi (n) , 



2 \ n \ n 

whose average by the binomial distribution pPq n ~^{^) is, 

nh(p) — (\ogp — \ogq)l 

loge / 1 I 2 \ logn 
+ ~2~ V npq) 2 

- ^ ^log27re • log f/< • - ) f log ( q - - 

+ nRi(n) + nR 2 ( — ,p). 

n 

Due to Eq. (D3), multiplied by (p/q) 1 and summed over I, the first term is obtained as: 

nh(p) 

- - (log 2?re + log p + log g) 
-(logp-logg)^^- 
+ ni?i(n) + nR 2 p 
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We can prove nRi (n) + nRi , p) is negligible almost in the same way as in Appendix B. After all, the average yield 
is, 

, . , logn 

1 / -ilog27re g p 1 /l 



APPENDIX E: lim^ ±D(Q£„||Q&„) 



Let us define, 



./(I) : =log 



p* q:=pV:=n + 5, 

n fa-Pi) e sgnwn^ 



n (Qi-qj) E sgn(7r)npr (i) 

i,j:i>j ttGS„ i 



and our task is to compute, 



7re,S„ i 

Due to the argument stated at the end of Appendix C, in the first sum over it € S n , we can concentrate on the term 
7T = id, which is evaluated as follows. 

EdimV n -p-r ; 



i,j:i>j 



y L 



(n+^il)! 



, II (Qi-Qj) I' 

l,j:i>j 

n ai-ii) 



d(d-l) 



ft (n+^-i) 

/ff«+^V^)<l)+o(/(")) 



II (* - 1j) , , . , , 
: ;j-i>j fit d(d-l) 



n - * 

i,j:i>j 



= J(h + ^]q]+o(/W) 



Here, the second equation was derived as follows. We first extended the region of 1 to jl; ^\ l i = n+ d(d 2 ^ j , because 
this causes only exponentially small difference due to the large deviation principles. Then, we applied the law of the 
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large number. Observe that 

n (Pi-pj) 

/(l) = log^- r+l0g:' 



n {Qi-Qi) 



i,j:i>j 



1+ e sgnwn^"' 4 



ires. 

Tr^id 

+ log 



1+ e sgnwriP^ h 

7T(=S n i 

7r^id 



= log ^ + 0(1), 



where the second equation is true due to the inequality, 



i<i+ e ^wn^ 41 u <i+d\. 



7r^id 



After all, we have, 



d(d-l)N n ' /; ' 



n+ ^-^J log iw +0(1) 



nD(q||p) + 0(l). 



